# Create vectors
x <- c(10, 11, 12, 13, 14, 15)
y <- c(20, 21, 22, 23, 24, 25)
z <- c(30, 31, 32, 33, 34, 35)StartR Workshop
University of Konstanz
November 23, 2024
Photo courtesy of @markusspiske
Matrices are combinations of vectors. Like vectors, they can only contain values of the same type (e.g., numeric or character). If you combine vectors of different types, they will be coerced into the same type.
There are two main ways to create matrices in R:
cbind() and rbind()matrix()The cbind() function combines vectors into a matrix by binding them together by column.
The rbind() function combines vectors into a matrix by binding them together by row
The matrix() function is an explicit way to create matrices from scratch. It takes the following arguments:
data: a vector containing the datanrow: the number of rowsncol: the number of columnsbyrow: logical value indicating whether the matrix should be filled by row (FALSE, default) or by column (TRUE)The matrix function can transforms the data into a variety of matrices.
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 6 7 8 9 10
[,1] [,2]
[1,] 1 6
[2,] 2 7
[3,] 3 8
[4,] 4 9
[5,] 5 10
Remember that vectors are indexed like this: x[i]
Matrices are indexed like this: x[i, j]
Use vectors to index multiple rows and/or columns.
Getting multiple row values
Getting multiple column values
Photo courtesy of @kolbymilton
Recall that letters is a vector of the alphabet. Create a matrix m1 with 3 rows and 5 columns using the first 15 letters of the alphabet.
Use cbind() to attach a sixth column with the letters p, q, and r.
Extract the 2nd to 4th column assign them to a new matrix m2.
Create a new matrix m3 by removing the 3rd row of m2.
Dataframes are similar to matrices, but they can contain different types of data (e.g., numeric and character). Because of this flexibility, they are commonly used to store data in R.
Dataframes are often imported from external sources (e.g., Excel, SPSS, or CSV files). However, we can also create dataframes from scratch with the data.frame()function.
We can select rows based on their numerical index (slicing).
Numeric indexing (slicing)
Using dplyr::slice()
Numeric indexing
Using dplyr::select()
$ operatorRetrieving a single column of a dataframe is so common that R provides an own shortcut for this task: the $ operator.
[1] 99 46 23 54 23
[1] "m" "m" "m" "f" "f"
Note that the $ operator returns the variable as a vector.
The $ operator is a simple way to add new columns to a dataframe.
# Adding a new character and a new numeric column
data$condition <- c("con", "con", "exp", "con", "exp")
data$score <- c(5.5, 2.3, 4.7, 6.7, 3.0)
data id sex age condition score
1 1 m 99 con 5.5
2 2 m 46 con 2.3
3 3 m 23 exp 4.7
4 4 f 54 con 6.7
5 5 f 23 exp 3.0
An alternative is the dplyr::mutate() function.
We have already used the names of colums to select them. To see all names of a dataframe, use the names() function.
[1] "id" "sex" "age" "condition" "score" "time"
[7] "weight"
The names can be changed by assigning new values.
[1] "id" "sex" "age" "group" "score" "time" "weight"
An alternative is the dplyr::rename() function.
Photo courtesy of @siora18
Create a dataframe demo based on the table on the right.
Correct the name of the second column to height.
Using the $ operator, convert height from cm to m.
Compute the average height.
Select rows with above-average height.
| name | weight |
|---|---|
| Alice | 165 |
| Bob | 175 |
| Charlie | 180 |
| David | |
| Eva | 160 |
Photo courtesy of @pawel_czerwinski
Dataframes often contain factors that are used to represent categorical variables (e.g., sex, education level, blood type).
A factor can contain only predefined values (called levels) with unique labels. Factors are created using the factor() function.
[1] "m" "m" "m" "f" "f"
[1] m m m f f
Levels: f m
Factors are very similar to character vectors, but they are treated differently in many statistical analyses and data visualizations.
We can change the order of the levels by explicitly specifying them in the factor() function.
# Create a character vector
sex = c("m", "m", "m", "f", "f")
# Order: "f", "m"
factor(sex, levels = c("f", "m"))[1] m m m f f
Levels: f m
[1] m m m f f
Levels: m f
This can be useful for reordering the levels in an analysis or plot (e.g., to change the order of the bars for males and females in a barplot).
We can also rename the levels of a factor by specifying the new names in the levels argument of the factor() function.
# Create a character vector
sex = c("m", "m", "m", "f", "f")
# Ordinary factor
factor(sex, levels = c("f", "m"))[1] m m m f f
Levels: f m
[1] male male male female female
Levels: male female
This can be useful for relabelling the levels in an analysis or plot.